13 May, 2020

Introduction

South Korea during COVID-19

  • One of the world’s most densely populated countries

  • 51.64 million inhabitants

  • First case of COVID-19 confirmed on the 20th of January 2020

  • 259 deaths caused by COVID-19

Research questions

  • How is human behaviour driving the spread of the disease?

  • How has the epidemic evolved in South Korea?

  • Is there any correlation between the place of infection and severity of the disease?

  • Does any gender or age predispose for getting the disease or for a more severe outcome?

  • Can characteristics city features be used to predict the burden of disease?

Materials and methods

Workflow and Structure of the project

Reproducibility

  • The project includes all steps in the data analysis
  • To achieve consistent computational results

Data cleaning

  • Remove non valid data (NA’s)

  • Remove non necessary columns.

  • Converting data into the tidy format:

    • Each variable has a column

    • Each observation has its own row

    • Each value has its own cell

Data augmenting

  • Joining dataset tables using full_join

  • Subsetting data

  • Combining columns using unite

  • Creating new variables for the analysis

Final datasets

  • Case data ( Case )

  • Patient data (Patient info + Patient route)

  • Time data (Time + Time age + Time gender + Time province + SearchTrend)

  • City data (region + Patient info)

Results

How is human behaviour driving the spread of the disease?

How has the epidemic evolved in South Korea?

How has the epidemic evolved in South Korea?

How has the epidemic evolved in South Korea?

Does any gender or age predispose for getting the disease or for a more severe outcome?

Does any gender or age predispose for getting the disease or for a more severe outcome?

Is there any correlation between the place of infection and severity of the disease?

How is human behaviour driving the spread of the disease?

Can characteristics city features be used to predict the burden of disease?

score_org score_pca
42.5% 49.6%

Can characteristics city features be used to predict the burden of disease?

ANN Network
accuracy
46.4%

Shiny app

Conclusion and discussion

  • Confirmed cases are higher than deaths.

  • There’s no correlation between the place of infection and severity of the disease.

  • Men die but more women are confirmed to be sick.

  • Young people are driving the spread.

  • People in their 70s and 80s have a higher fatality rage. There are clusters of superspreaders of certain age range.

  • Accuracy is just above 50 % - better than random with 4 classes (similar performance as kmeans).

Superspreaders

Correlation matrix

*** ### PCA Variance explained

Regional cases plot